• 



NY Times 



205 





All work and 
no play makes 
Jack a dull 
boy. 



210 



200 



Fig. 2A (Prior Art) 



225 

C3Front_Page.htm 

C3Front_Page_Files 



240 



235 Jf7Star.gif 
250 {fTText_Box.txt 



215 Web Page File Structure 



r 



v. 





230 


<img src 


-...> 


— <txt src= 


245 







220 Web Page HTML Source File 



Fig. 2B (Prior Art) 




Start ) 305 



300 



User navigates to location 
with Web page 



User right clicks mouse 
on Web page icon and 
chooses "Save as Web 
Archive" 



310 



315 



320 



Progress Dialog Displayed 
to User 




Parse through HTML page 
searching for links to 
supporting files and 
gathering a list of 
supporting files 



350 



Pack supporting files and 
main HTML page file into 
MHTML file 



355 




FIG. 3 



from step 340 (Fig. 3) 



Search for "src", "lowsrc" 
and "dynsrc" files 
referenced in <img> tags 



Search for "Background" 
files referenced in <body> 
tags 



Search for "src" files 
referenced in <script> tags 



Search for "src" files 
referenced in <bgsound> 
tags 



tags 



Search for "src" files 
referenced in <embed> 
tags 



405 



410 



415 



420 



425 



430 



Search for "href" files 

referenced in <link 
rel="Stylesheet"> tags 



435 



Yes 



>v at step 430?^ 






f 


No 


455 

< 






Search for "src" files 
referenced in <frame> 
tags 


Search for 
"©import url(...) 
files in stylesheets 





440 



460 



480 



Any files 



Yes 



Parse through 
"filelist.xml" searching 

for all "HRef" files 
referenced in <o:File> tag 

1 

To step 355 (Fig. 3) 




< found at >- 
^Sstep 45SJx // ^ 








jTNo 








470 


Load HTML 
files and 

recursively 
examine 


Search for "href" files 
referenced in <link 
rel=filelist> tags 


< 





46 



FIG. 4 



from step 350 (Fig. 3) 



Call INETCOMM.DLL API to 

pull together list of 
supporting files and main 
source file into an MHTML 
file 



505 



Create file called 
"Webpage(webarchive).mht u 



510 



355 



to step 360 (Fig. 3) 

FIG. 5 









ISMS 










■ y^Jr^jQ C:\WINDOWS\Desklop\Pa * 


n e 

Page_files [Page 


loot 





flG. Gft 







|D|X| 




| : : File Edit : View% go^-|avpuS 






;i Back;--:-:-*;;" r Forward | : ifei;^UR..S^ 






| Address JCD C:\WINDOWS\Desktop\Pa 



Page_fiies 



i J-^^Pfes.-. ... ........ 

'ji^J-Saw y a$We b'Archiw' I 'jpf 




6 



bio 



MIME-Version : 1.0 " ~~ 

Content-Type: multipflB' related; 

y tri n , r boundar V=" =- Ne *tPart_000_0000_OlBF4561.A9B32F20» 

x-MimeOLE: Produced By Microsoft MimeOLE V5 . 00 . 2314 . 1300 

This is a multi-part message in M IME format. 

=-NextPart_000_0000_01BF4561 . A9B32F20 ' 

Content-Type: text/html; charset= " iso-8859-1 - 
Content-Transfer-Encoding : guoted-printable 

Content-Location: file : ///C: /WINDOWS/Desktop/Pages/Page .htm 

«TllTs t £ ZlT filS ' inClUding 3 t0 "^e001. gi f» and 

=-NextPart_000_0000_01BF4561.A9B32F20 
Content-Type: image/gif 

Content-Transfer-Encoding: base64 
Content-Location : 

file:///C:/WINDOWS/Desktop/Pages/Page„files/image001.gif 
(content for the image inside of "Page_f iles ■ ) 

— =_NextPart_000_0000_01BF4561.A9B32F20 
Content-Type: text/xml; charset= "iso-8859-1 
Content-Transfer-Encoding: 7bit 
Content-Location: 

file:///C:/WI N DO W S/De S ktop/Pages/Page_fil e s/filelist.x m l. 
(content for the - f ilelist .xml - inside of » Page_f iles " ) 
— =_NextPart- 000_OOOQ 01BF4561 . A9B32F20— 



F/ £ - ~? 













J Afes JO C:\WIND0WS\Desktop\Pa ^| 


: s m m 




■ Page_files . Jjgfg Page (web 

archive) 







p/6-. 8 



User right-clicks on 

MHTML file and 
selects "Unpack Web 
archive" 



Progress dialog 
displayed to user 



Determine name of 
main HTML Web page 



900 



910 



Determine locations 
of supporting files 



915 



920 



925 



930 



\ny of theses 
locations 



Yes 



*v already in > 




use? 




Tno 






940 


Convert each 


Yes 
< < 


MIME part into 


HTML file and 




save 






945 




FIG. 9 



/o to 



Content- 


-Location : 


file: 


///C: 


/Folderl/Page.htm 


Content- 


-Location: 


file: 


///C: 


/.Folder 1 /Folder 2 /Folder 3 / Imagel . gi f < 


Content- 


-Location: 


file: 


///C: 


/ Image2.gif 



©Page.htm 
£3 Folder2 ^= 

GD Folder3 



JO I*" 



[lmagel.gif 



C3 Page_files 

@ Image2.gif 

/o5° 



P6. /OP 



